Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

نویسندگان

Tim Brys

Ann Nowé

Daniel Kudenko

Matthew E. Taylor

چکیده

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multiobjective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective’s estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique’s decisions, yielding insights into the nature of the problems

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Objectivization in Reinforcement Learning

Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivizati...

متن کامل

Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning.

Cue- and reward-evoked phasic dopamine activity during Pavlovian and operant conditioning paradigms is well correlated with reward-prediction errors from formal reinforcement learning models, which feature teaching signals in the form of discrepancies between actual and expected reward outcomes. Additionally, in learning tasks where conditioned cues probabilistically predict rewards, dopamine n...

متن کامل

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that...

متن کامل

A Robust Desirability-based Approach to Optimizing Multiple Correlated Responses

There are many real problems in which multiple responses should be optimized simultaneously by setting of process variables. One of the common approaches for optimization of multi-response problems is desirability function. In most real cases, there is a correlation structure between responses so ignoring the correlation may lead to mistake results. Hence, in this paper a robust approach based ...

متن کامل

Reward modulates adaptations to conflict.

Both cognitive conflict (e.g. Verguts & Notebaert, 2009) and reward signals (e.g. Waszak & Pholulamdeth, 2009) have been proposed to enhance task-relevant associations. Bringing these two notions together, we predicted that reward modulates conflict-based sequential adaptations in cognitive control. This was tested combining either a single flanker task (Experiment 1) or a task-switch paradigm ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

نویسندگان

چکیده

منابع مشابه

Multi-Objectivization in Reinforcement Learning

Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning.

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

A Robust Desirability-based Approach to Optimizing Multiple Correlated Responses

Reward modulates adaptations to conflict.

عنوان ژورنال:

اشتراک گذاری